Sample Bounded Distributed Reinforcement Learning for Decentralized POMDPs
نویسندگان
چکیده
Decentralized partially observable Markov decision processes (Dec-POMDPs) offer a powerful modeling technique for realistic multi-agent coordination problems under uncertainty. Prevalent solution techniques are centralized and assume prior knowledge of the model. We propose distributed reinforcement learning approach, where agents take turns to learn best responses each other’s policies. This promotes decentralization policy computation problem, relaxes reliance on full problem parameters. derive relation between sample complexity response error tolerance. Our key contribution is show that could grow exponentially with horizon. empirically even if requirement set lower than what theory demands, our approach can produce (near) optimal policies in some benchmark Dec-POMDP problems.
منابع مشابه
Sample Bounded Distributed Reinforcement Learning for Decentralized POMDPs
Decentralized partially observable Markov decision processes (Dec-POMDPs) offer a powerful modeling technique for realistic multi-agent coordination problems under uncertainty. Prevalent solution techniques are centralized and assume prior knowledge of the model. We propose a distributed reinforcement learning approach, where agents take turns to learn best responses to each other’s policies. T...
متن کاملPruning for Monte Carlo Distributed Reinforcement Learning in Decentralized POMDPs
Decentralized partially observable Markov decision processes (Dec-POMDPs) offer a powerful modeling technique for realistic multi-agent coordination problems under uncertainty. Prevalent solution techniques are centralized and assume prior knowledge of the model. Recently a Monte Carlo based distributed reinforcement learning approach was proposed, where agents take turns to learn best response...
متن کاملSolving Finite Horizon Decentralized POMDPs by Distributed Reinforcement Learning
Decentralized partially observable Markov decision processes (Dec-POMDPs) offer a powerful modeling technique for realistic multi-agent coordination problems under uncertainty. Prevalent solution techniques are centralized and assume prior knowledge of the model. We propose a distributed reinforcement learning approach, where agents take turns to learn best responses to each other’s policies. T...
متن کاملBounded Dynamic Programming for Decentralized POMDPs
Solving decentralized POMDPs (DEC-POMDPs) optimally is a very hard problem. As a result, several approximate algorithms have been developed, but these do not have satisfactory error bounds. In this paper, we first discuss optimal dynamic programming and some approximate finite horizon DEC-POMDP algorithms. We then present a bounded dynamic programming algorithm. Given a problem and an error bou...
متن کاملOptimizing Memory-Bounded Controllers for Decentralized POMDPs
We present a memory-bounded optimization approach for solving infinite-horizon decentralized POMDPs. Policies for each agent are represented by stochastic finite state controllers. We formulate the problem of optimizing these policies as a nonlinear program, leveraging powerful existing nonlinear optimization techniques for solving the problem. While existing solvers only guarantee locally opti...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence
سال: 2021
ISSN: ['2159-5399', '2374-3468']
DOI: https://doi.org/10.1609/aaai.v26i1.8260